Rufus: The Information Sponge

        Typical computer users are connected via worldwide networks to an increasing number of data sources and other users. This interconnectivity provides users with previously unknown riches of available knowledge. Unfortunately, it simultaneously drowns them in a flood of unsorted, unclassified and unusable data. Existing information systems only cure some of information overload's symptoms. File systems, for example, typically treat documents such as electronic mail, project notebooks, and source code as uninterpreted text, with no associated structure or semantics. Retrieval systems search for text patterns rather than semantic attributes (e.g., author). This is especially problematic when searching
        across heterogeneous file types. Database systems focus primarily on their own internal data representations, providing one-way paths for importing a user's native data.
        What is needed is an environment in which existing applications co-exist with new applications that can operate across existing file types. A framework needs to be developed in which a user's existing data can be described, in both structure and behavior, together with a toolbox upon which new applications can be built. These new applications will use the centralized descriptions to provide a more integrated view of the user's data, and easier access to the data's behavior. The Rufus project has developed an extensible object-oriented data model, storage system, and associated search and display methods for a variety of user file types. The system automatically classifies a user's data files and extracts type-specific attributes. Users may then search, browse, filter, link and display the imported data objects. New file types can be added as new applications arise.

By: Eli Messinger, Kurt Shoens, John Thomas and Allen Luniewski

Published in: RJ8294 in 1991

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

7835.ps.gz

Questions about this service can be mailed to reports@us.ibm.com .